Object detection
is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “
Covered Object Reasoning
”, aimed at reasoning the category label of target object in the given image particularly when it has been totally
covered
(or
invisible
). To resolve this problem, we propose
CoBjeason
to seize the opportunity when visual reasoning meets the knowledge graph, where “
empirical cognition
” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or
unknown entity
) to observe the surrounding visual cues in the given image and gradually select
entities
and
relations
from the global
gallery-level
knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to
infer
the main structure of image-level knowledge graph
forward
expanded from the
unknown entity
. In turn, for another, based on the
reasoned
image-level knowledge graph, the semantic context among
entities
will be aggregated backward into
unknown entity
to select an appropriate entity from the global
gallery-level
knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above
Forward
&
Backward Reasoning
will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on
Covered Object Reasoning
with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on
Covered Object Reasoning
and the proposed model
CoBjeason
could offer novel insights into more basic Computer Vision (CV) tasks, such as
Semantic Segmentation
with better understanding on the current scene when some objects are blurred or covered,
Visual Question Answering
with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and
Image Caption Generation
with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed
CoBjeason
can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “
exploration cost
”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.